<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="UTF-8" />
<title>NASM - The Netwide Assembler</title>
<link href="nasmdoc.css" rel="stylesheet" type="text/css" />
<link href="local.css" rel="stylesheet" type="text/css" />
</head>
<body>
<ul class="navbar">
<li class="first"><a class="prev" href="nasmdoc8.html">Chapter 8</a></li>
<li><a class="next" href="nasmdo10.html">Chapter 10</a></li>
<li><a class="toc" href="nasmdoc0.html">Contents</a></li>
<li class="last"><a class="index" href="nasmdoci.html">Index</a></li>
</ul>
<div class="title">
<h1>NASM - The Netwide Assembler</h1>
<span class="subtitle">version 2.16.03</span>
</div>
<div class="contents"
>
<h2 id="chapter-9">Chapter 9: Writing 16-bit Code (DOS, Windows 3/3.1)</h2>
<p>This chapter attempts to cover some of the common issues encountered
when writing 16-bit code to run under <code>MS-DOS</code> or
<code>Windows 3.x</code>. It covers how to link programs to produce
<code>.EXE</code> or <code>.COM</code> files, how to write
<code>.SYS</code> device drivers, and how to interface assembly language
code with 16-bit C compilers and with Borland Pascal.</p>
<h3 id="section-9.1">9.1 Producing <code>.EXE</code> Files</h3>
<p>Any large program written under DOS needs to be built as a
<code>.EXE</code> file: only <code>.EXE</code> files have the necessary
internal structure required to span more than one 64K segment. Windows
programs, also, have to be built as <code>.EXE</code> files, since Windows
does not support the <code>.COM</code> format.</p>
<p>In general, you generate <code>.EXE</code> files by using the
<code>obj</code> output format to produce one or more <code>.obj</code>
files, and then linking them together using a linker. However, NASM also
supports the direct generation of simple DOS <code>.EXE</code> files using
the <code>bin</code> output format (by using <code>DB</code> and
<code>DW</code> to construct the <code>.EXE</code> file header), and a
macro package is supplied to do this. Thanks to Yann Guidon for
contributing the code for this.</p>
<p>NASM may also support <code>.EXE</code> natively as another output
format in future releases.</p>
<h4 id="section-9.1.1">9.1.1 Using the <code>obj</code> Format To Generate <code>.EXE</code> Files</h4>
<p>This section describes the usual method of generating <code>.EXE</code>
files by linking <code>.OBJ</code> files together.</p>
<p>Most 16-bit programming language packages come with a suitable linker;
if you have none of these, there is a free linker called VAL, available in
<code>LZH</code> archive format from
<a href="ftp://x2ftp.oulu.fi/pub/msdos/programming/lang/"><code>x2ftp.oulu.fi</code></a>.
An LZH archiver can be found at
<a href="ftp://ftp.simtel.net/pub/simtelnet/msdos/arcers"><code>ftp.simtel.net</code></a>.
There is another `free' linker (though this one doesn't come with sources)
called FREELINK, available from
<a href="http://www.pcorner.com/tpc/old/3-101.html"><code>www.pcorner.com</code></a>.
A third, <code>djlink</code>, written by DJ Delorie, is available at
<a href="http://www.delorie.com/djgpp/16bit/djlink/"><code>www.delorie.com</code></a>.
A fourth linker, <code>ALINK</code>, written by Anthony A.J. Williams, is
available at
<a href="http://alink.sourceforge.net"><code>alink.sourceforge.net</code></a>.</p>
<p>When linking several <code>.OBJ</code> files into a <code>.EXE</code>
file, you should ensure that exactly one of them has a start point defined
(using the <code>..start</code> special symbol defined by the
<code>obj</code> format: see <a href="nasmdoc8.html#section-8.4.6">section
8.4.6</a>). If no module defines a start point, the linker will not know
what value to give the entry-point field in the output file header; if more
than one defines a start point, the linker will not know <em>which</em>
value to use.</p>
<p>An example of a NASM source file which can be assembled to a
<code>.OBJ</code> file and linked on its own to a <code>.EXE</code> is
given here. It demonstrates the basic principles of defining a stack,
initialising the segment registers, and declaring a start point. This file
is also provided in the <code>test</code> subdirectory of the NASM
archives, under the name <code>objexe.asm</code>.</p>
<pre>
segment code 

..start: 
        mov     ax,data 
        mov     ds,ax 
        mov     ax,stack 
        mov     ss,ax 
        mov     sp,stacktop
</pre>
<p>This initial piece of code sets up <code>DS</code> to point to the data
segment, and initializes <code>SS</code> and <code>SP</code> to point to
the top of the provided stack. Notice that interrupts are implicitly
disabled for one instruction after a move into <code>SS</code>, precisely
for this situation, so that there's no chance of an interrupt occurring
between the loads of <code>SS</code> and <code>SP</code> and not having a
stack to execute on.</p>
<p>Note also that the special symbol <code>..start</code> is defined at the
beginning of this code, which means that will be the entry point into the
resulting executable file.</p>
<pre>
        mov     dx,hello 
        mov     ah,9 
        int     0x21
</pre>
<p>The above is the main program: load <code>DS:DX</code> with a pointer to
the greeting message (<code>hello</code> is implicitly relative to the
segment <code>data</code>, which was loaded into <code>DS</code> in the
setup code, so the full pointer is valid), and call the DOS print-string
function.</p>
<pre>
        mov     ax,0x4c00 
        int     0x21
</pre>
<p>This terminates the program using another DOS system call.</p>
<pre>
segment data 

hello:  db      'hello, world', 13, 10, '$'
</pre>
<p>The data segment contains the string we want to display.</p>
<pre>
segment stack stack 
        resb 64 
stacktop:
</pre>
<p>The above code declares a stack segment containing 64 bytes of
uninitialized stack space, and points <code>stacktop</code> at the top of
it. The directive <code>segment stack stack</code> defines a segment
<em>called</em> <code>stack</code>, and also of <em>type</em>
<code>STACK</code>. The latter is not necessary to the correct running of
the program, but linkers are likely to issue warnings or errors if your
program has no segment of type <code>STACK</code>.</p>
<p>The above file, when assembled into a <code>.OBJ</code> file, will link
on its own to a valid <code>.EXE</code> file, which when run will print
`hello, world' and then exit.</p>
<h4 id="section-9.1.2">9.1.2 Using the <code>bin</code> Format To Generate <code>.EXE</code> Files</h4>
<p>The <code>.EXE</code> file format is simple enough that it's possible to
build a <code>.EXE</code> file by writing a pure-binary program and
sticking a 32-byte header on the front. This header is simple enough that
it can be generated using <code>DB</code> and <code>DW</code> commands by
NASM itself, so that you can use the <code>bin</code> output format to
directly generate <code>.EXE</code> files.</p>
<p>Included in the NASM archives, in the <code>misc</code> subdirectory, is
a file <code>exebin.mac</code> of macros. It defines three macros:
<code>EXE_begin</code>, <code>EXE_stack</code> and <code>EXE_end</code>.</p>
<p>To produce a <code>.EXE</code> file using this method, you should start
by using <code>%include</code> to load the <code>exebin.mac</code> macro
package into your source file. You should then issue the
<code>EXE_begin</code> macro call (which takes no arguments) to generate
the file header data. Then write code as normal for the <code>bin</code>
format &ndash; you can use all three standard sections <code>.text</code>,
<code>.data</code> and <code>.bss</code>. At the end of the file you should
call the <code>EXE_end</code> macro (again, no arguments), which defines
some symbols to mark section sizes, and these symbols are referred to in
the header code generated by <code>EXE_begin</code>.</p>
<p>In this model, the code you end up writing starts at <code>0x100</code>,
just like a <code>.COM</code> file &ndash; in fact, if you strip off the
32-byte header from the resulting <code>.EXE</code> file, you will have a
valid <code>.COM</code> program. All the segment bases are the same, so you
are limited to a 64K program, again just like a <code>.COM</code> file.
Note that an <code>ORG</code> directive is issued by the
<code>EXE_begin</code> macro, so you should not explicitly issue one of
your own.</p>
<p>You can't directly refer to your segment base value, unfortunately,
since this would require a relocation in the header, and things would get a
lot more complicated. So you should get your segment base by copying it out
of <code>CS</code> instead.</p>
<p>On entry to your <code>.EXE</code> file, <code>SS:SP</code> are already
set up to point to the top of a 2Kb stack. You can adjust the default stack
size of 2Kb by calling the <code>EXE_stack</code> macro. For example, to
change the stack size of your program to 64 bytes, you would call
<code>EXE_stack 64</code>.</p>
<p>A sample program which generates a <code>.EXE</code> file in this way is
given in the <code>test</code> subdirectory of the NASM archive, as
<code>binexe.asm</code>.</p>
<h3 id="section-9.2">9.2 Producing <code>.COM</code> Files</h3>
<p>While large DOS programs must be written as <code>.EXE</code> files,
small ones are often better written as <code>.COM</code> files.
<code>.COM</code> files are pure binary, and therefore most easily produced
using the <code>bin</code> output format.</p>
<h4 id="section-9.2.1">9.2.1 Using the <code>bin</code> Format To Generate <code>.COM</code> Files</h4>
<p><code>.COM</code> files expect to be loaded at offset <code>100h</code>
into their segment (though the segment may change). Execution then begins
at <code>100h</code>, i.e. right at the start of the program. So to write a
<code>.COM</code> program, you would create a source file looking like</p>
<pre>
        org 100h 

section .text 

start: 
        ; put your code here 

section .data 

        ; put data items here 

section .bss 

        ; put uninitialized data here
</pre>
<p>The <code>bin</code> format puts the <code>.text</code> section first in
the file, so you can declare data or BSS items before beginning to write
code if you want to and the code will still end up at the front of the file
where it belongs.</p>
<p>The BSS (uninitialized data) section does not take up space in the
<code>.COM</code> file itself: instead, addresses of BSS items are resolved
to point at space beyond the end of the file, on the grounds that this will
be free memory when the program is run. Therefore you should not rely on
your BSS being initialized to all zeros when you run.</p>
<p>To assemble the above program, you should use a command line like</p>
<pre>
nasm myprog.asm -fbin -o myprog.com
</pre>
<p>The <code>bin</code> format would produce a file called
<code>myprog</code> if no explicit output file name were specified, so you
have to override it and give the desired file name.</p>
<h4 id="section-9.2.2">9.2.2 Using the <code>obj</code> Format To Generate <code>.COM</code> Files</h4>
<p>If you are writing a <code>.COM</code> program as more than one module,
you may wish to assemble several <code>.OBJ</code> files and link them
together into a <code>.COM</code> program. You can do this, provided you
have a linker capable of outputting <code>.COM</code> files directly (TLINK
does this), or alternatively a converter program such as
<code>EXE2BIN</code> to transform the <code>.EXE</code> file output from
the linker into a <code>.COM</code> file.</p>
<p>If you do this, you need to take care of several things:</p>
<ul>
<li>
<p>The first object file containing code should start its code segment with
a line like <code>RESB 100h</code>. This is to ensure that the code begins
at offset <code>100h</code> relative to the beginning of the code segment,
so that the linker or converter program does not have to adjust address
references within the file when generating the <code>.COM</code> file.
Other assemblers use an <code>ORG</code> directive for this purpose, but
<code>ORG</code> in NASM is a format-specific directive to the
<code>bin</code> output format, and does not mean the same thing as it does
in MASM-compatible assemblers.</p>
</li>
<li>
<p>You don't need to define a stack segment.</p>
</li>
<li>
<p>All your segments should be in the same group, so that every time your
code or data references a symbol offset, all offsets are relative to the
same segment base. This is because, when a <code>.COM</code> file is
loaded, all the segment registers contain the same value.</p>
</li>
</ul>
<h3 id="section-9.3">9.3 Producing <code>.SYS</code> Files</h3>
<p>MS-DOS device drivers &ndash; <code>.SYS</code> files &ndash; are pure
binary files, similar to <code>.COM</code> files, except that they start at
origin zero rather than <code>100h</code>. Therefore, if you are writing a
device driver using the <code>bin</code> format, you do not need the
<code>ORG</code> directive, since the default origin for <code>bin</code>
is zero. Similarly, if you are using <code>obj</code>, you do not need the
<code>RESB 100h</code> at the start of your code segment.</p>
<p><code>.SYS</code> files start with a header structure, containing
pointers to the various routines inside the driver which do the work. This
structure should be defined at the start of the code segment, even though
it is not actually code.</p>
<p>For more information on the format of <code>.SYS</code> files, and the
data which has to go in the header structure, a list of books is given in
the Frequently Asked Questions list for the newsgroup
<a href="news:comp.os.msdos.programmer"><code>comp.os.msdos.programmer</code></a>.</p>
<h3 id="section-9.4">9.4 Interfacing to 16-bit C Programs</h3>
<p>This section covers the basics of writing assembly routines that call,
or are called from, C programs. To do this, you would typically write an
assembly module as a <code>.OBJ</code> file, and link it with your C
modules to produce a mixed-language program.</p>
<h4 id="section-9.4.1">9.4.1 External Symbol Names</h4>
<p>C compilers have the convention that the names of all global symbols
(functions or data) they define are formed by prefixing an underscore to
the name as it appears in the C program. So, for example, the function a C
programmer thinks of as <code>printf</code> appears to an assembly language
programmer as <code>_printf</code>. This means that in your assembly
programs, you can define symbols without a leading underscore, and not have
to worry about name clashes with C symbols.</p>
<p>If you find the underscores inconvenient, you can define macros to
replace the <code>GLOBAL</code> and <code>EXTERN</code> directives as
follows:</p>
<pre>
%macro  cglobal 1 

  global  _%1 
  %define %1 _%1 

%endmacro 

%macro  cextern 1 

  extern  _%1 
  %define %1 _%1 

%endmacro
</pre>
<p>(These forms of the macros only take one argument at a time; a
<code>%rep</code> construct could solve this.)</p>
<p>If you then declare an external like this:</p>
<pre>
cextern printf
</pre>
<p>then the macro will expand it as</p>
<pre>
extern  _printf 
%define printf _printf
</pre>
<p>Thereafter, you can reference <code>printf</code> as if it was a symbol,
and the preprocessor will put the leading underscore on where necessary.</p>
<p>The <code>cglobal</code> macro works similarly. You must use
<code>cglobal</code> before defining the symbol in question, but you would
have had to do that anyway if you used <code>GLOBAL</code>.</p>
<p>Also see <a href="nasmdoc2.html#section-2.1.28">section 2.1.28</a>.</p>
<h4 id="section-9.4.2">9.4.2 Memory Models</h4>
<p>NASM contains no mechanism to support the various C memory models
directly; you have to keep track yourself of which one you are writing for.
This means you have to keep track of the following things:</p>
<ul>
<li>
<p>In models using a single code segment (tiny, small and compact),
functions are near. This means that function pointers, when stored in data
segments or pushed on the stack as function arguments, are 16 bits long and
contain only an offset field (the <code>CS</code> register never changes
its value, and always gives the segment part of the full function address),
and that functions are called using ordinary near <code>CALL</code>
instructions and return using <code>RETN</code> (which, in NASM, is
synonymous with <code>RET</code> anyway). This means both that you should
write your own routines to return with <code>RETN</code>, and that you
should call external C routines with near <code>CALL</code> instructions.</p>
</li>
<li>
<p>In models using more than one code segment (medium, large and huge),
functions are far. This means that function pointers are 32 bits long
(consisting of a 16-bit offset followed by a 16-bit segment), and that
functions are called using <code>CALL FAR</code> (or
<code>CALL seg:offset</code>) and return using <code>RETF</code>. Again,
you should therefore write your own routines to return with
<code>RETF</code> and use <code>CALL FAR</code> to call external routines.</p>
</li>
<li>
<p>In models using a single data segment (tiny, small and medium), data
pointers are 16 bits long, containing only an offset field (the
<code>DS</code> register doesn't change its value, and always gives the
segment part of the full data item address).</p>
</li>
<li>
<p>In models using more than one data segment (compact, large and huge),
data pointers are 32 bits long, consisting of a 16-bit offset followed by a
16-bit segment. You should still be careful not to modify <code>DS</code>
in your routines without restoring it afterwards, but <code>ES</code> is
free for you to use to access the contents of 32-bit data pointers you are
passed.</p>
</li>
<li>
<p>The huge memory model allows single data items to exceed 64K in size. In
all other memory models, you can access the whole of a data item just by
doing arithmetic on the offset field of the pointer you are given, whether
a segment field is present or not; in huge model, you have to be more
careful of your pointer arithmetic.</p>
</li>
<li>
<p>In most memory models, there is a <em>default</em> data segment, whose
segment address is kept in <code>DS</code> throughout the program. This
data segment is typically the same segment as the stack, kept in
<code>SS</code>, so that functions' local variables (which are stored on
the stack) and global data items can both be accessed easily without
changing <code>DS</code>. Particularly large data items are typically
stored in other segments. However, some memory models (though not the
standard ones, usually) allow the assumption that <code>SS</code> and
<code>DS</code> hold the same value to be removed. Be careful about
functions' local variables in this latter case.</p>
</li>
</ul>
<p>In models with a single code segment, the segment is called
<code>_TEXT</code>, so your code segment must also go by this name in order
to be linked into the same place as the main code segment. In models with a
single data segment, or with a default data segment, it is called
<code>_DATA</code>.</p>
<h4 id="section-9.4.3">9.4.3 Function Definitions and Function Calls</h4>
<p>The C calling convention in 16-bit programs is as follows. In the
following description, the words <em>caller</em> and <em>callee</em> are
used to denote the function doing the calling and the function which gets
called.</p>
<ul>
<li>
<p>The caller pushes the function's parameters on the stack, one after
another, in reverse order (right to left, so that the first argument
specified to the function is pushed last).</p>
</li>
<li>
<p>The caller then executes a <code>CALL</code> instruction to pass control
to the callee. This <code>CALL</code> is either near or far depending on
the memory model.</p>
</li>
<li>
<p>The callee receives control, and typically (although this is not
actually necessary, in functions which do not need to access their
parameters) starts by saving the value of <code>SP</code> in
<code>BP</code> so as to be able to use <code>BP</code> as a base pointer
to find its parameters on the stack. However, the caller was probably doing
this too, so part of the calling convention states that <code>BP</code>
must be preserved by any C function. Hence the callee, if it is going to
set up <code>BP</code> as a <em>frame pointer</em>, must push the previous
value first.</p>
</li>
<li>
<p>The callee may then access its parameters relative to <code>BP</code>.
The word at <code>[BP]</code> holds the previous value of <code>BP</code>
as it was pushed; the next word, at <code>[BP+2]</code>, holds the offset
part of the return address, pushed implicitly by <code>CALL</code>. In a
small-model (near) function, the parameters start after that, at
<code>[BP+4]</code>; in a large-model (far) function, the segment part of
the return address lives at <code>[BP+4]</code>, and the parameters begin
at <code>[BP+6]</code>. The leftmost parameter of the function, since it
was pushed last, is accessible at this offset from <code>BP</code>; the
others follow, at successively greater offsets. Thus, in a function such as
<code>printf</code> which takes a variable number of parameters, the
pushing of the parameters in reverse order means that the function knows
where to find its first parameter, which tells it the number and type of
the remaining ones.</p>
</li>
<li>
<p>The callee may also wish to decrease <code>SP</code> further, so as to
allocate space on the stack for local variables, which will then be
accessible at negative offsets from <code>BP</code>.</p>
</li>
<li>
<p>The callee, if it wishes to return a value to the caller, should leave
the value in <code>AL</code>, <code>AX</code> or <code>DX:AX</code>
depending on the size of the value. Floating-point results are sometimes
(depending on the compiler) returned in <code>ST0</code>.</p>
</li>
<li>
<p>Once the callee has finished processing, it restores <code>SP</code>
from <code>BP</code> if it had allocated local stack space, then pops the
previous value of <code>BP</code>, and returns via <code>RETN</code> or
<code>RETF</code> depending on memory model.</p>
</li>
<li>
<p>When the caller regains control from the callee, the function parameters
are still on the stack, so it typically adds an immediate constant to
<code>SP</code> to remove them (instead of executing a number of slow
<code>POP</code> instructions). Thus, if a function is accidentally called
with the wrong number of parameters due to a prototype mismatch, the stack
will still be returned to a sensible state since the caller, which
<em>knows</em> how many parameters it pushed, does the removing.</p>
</li>
</ul>
<p>It is instructive to compare this calling convention with that for
Pascal programs (described in <a href="#section-9.5.1">section 9.5.1</a>).
Pascal has a simpler convention, since no functions have variable numbers
of parameters. Therefore the callee knows how many parameters it should
have been passed, and is able to deallocate them from the stack itself by
passing an immediate argument to the <code>RET</code> or <code>RETF</code>
instruction, so the caller does not have to do it. Also, the parameters are
pushed in left-to-right order, not right-to-left, which means that a
compiler can give better guarantees about sequence points without
performance suffering.</p>
<p>Thus, you would define a function in C style in the following way. The
following example is for small model:</p>
<pre>
global  _myfunc 

_myfunc: 
        push    bp 
        mov     bp,sp 
        sub     sp,0x40         ; 64 bytes of local stack space 
        mov     bx,[bp+4]       ; first parameter to function 

        ; some more code 

        mov     sp,bp           ; undo "sub sp,0x40" above 
        pop     bp 
        ret
</pre>
<p>For a large-model function, you would replace <code>RET</code> by
<code>RETF</code>, and look for the first parameter at <code>[BP+6]</code>
instead of <code>[BP+4]</code>. Of course, if one of the parameters is a
pointer, then the offsets of <em>subsequent</em> parameters will change
depending on the memory model as well: far pointers take up four bytes on
the stack when passed as a parameter, whereas near pointers take up two.</p>
<p>At the other end of the process, to call a C function from your assembly
code, you would do something like this:</p>
<pre>
extern  _printf 

      ; and then, further down... 

      push    word [myint]        ; one of my integer variables 
      push    word mystring       ; pointer into my data segment 
      call    _printf 
      add     sp,byte 4           ; `byte' saves space 

      ; then those data items... 

segment _DATA 

myint         dw    1234 
mystring      db    'This number -&gt; %d &lt;- should be 1234',10,0
</pre>
<p>This piece of code is the small-model assembly equivalent of the C code</p>
<pre>
    int myint = 1234; 
    printf("This number -&gt; %d &lt;- should be 1234\n", myint);
</pre>
<p>In large model, the function-call code might look more like this. In
this example, it is assumed that <code>DS</code> already holds the segment
base of the segment <code>_DATA</code>. If not, you would have to
initialize it first.</p>
<pre>
      push    word [myint] 
      push    word seg mystring   ; Now push the segment, and... 
      push    word mystring       ; ... offset of "mystring" 
      call    far _printf 
      add    sp,byte 6
</pre>
<p>The integer value still takes up one word on the stack, since large
model does not affect the size of the <code>int</code> data type. The first
argument (pushed last) to <code>printf</code>, however, is a data pointer,
and therefore has to contain a segment and offset part. The segment should
be stored second in memory, and therefore must be pushed first. (Of course,
<code>PUSH DS</code> would have been a shorter instruction than
<code>PUSH WORD SEG mystring</code>, if <code>DS</code> was set up as the
above example assumed.) Then the actual call becomes a far call, since
functions expect far calls in large model; and <code>SP</code> has to be
increased by 6 rather than 4 afterwards to make up for the extra word of
parameters.</p>
<h4 id="section-9.4.4">9.4.4 Accessing Data Items</h4>
<p>To get at the contents of C variables, or to declare variables which C
can access, you need only declare the names as <code>GLOBAL</code> or
<code>EXTERN</code>. (Again, the names require leading underscores, as
stated in <a href="#section-9.4.1">section 9.4.1</a>.) Thus, a C variable
declared as <code>int i</code> can be accessed from assembler as</p>
<pre>
extern _i 

        mov ax,[_i]
</pre>
<p>And to declare your own integer variable which C programs can access as
<code>extern int j</code>, you do this (making sure you are assembling in
the <code>_DATA</code> segment, if necessary):</p>
<pre>
global  _j 

_j      dw      0
</pre>
<p>To access a C array, you need to know the size of the components of the
array. For example, <code>int</code> variables are two bytes long, so if a
C program declares an array as <code>int a[10]</code>, you can access
<code>a[3]</code> by coding <code>mov ax,[_a+6]</code>. (The byte offset 6
is obtained by multiplying the desired array index, 3, by the size of the
array element, 2.) The sizes of the C base types in 16-bit compilers are: 1
for <code>char</code>, 2 for <code>short</code> and <code>int</code>, 4 for
<code>long</code> and <code>float</code>, and 8 for <code>double</code>.</p>
<p>To access a C data structure, you need to know the offset from the base
of the structure to the field you are interested in. You can either do this
by converting the C structure definition into a NASM structure definition
(using <code>STRUC</code>), or by calculating the one offset and using just
that.</p>
<p>To do either of these, you should read your C compiler's manual to find
out how it organizes data structures. NASM gives no special alignment to
structure members in its own <code>STRUC</code> macro, so you have to
specify alignment yourself if the C compiler generates it. Typically, you
might find that a structure like</p>
<pre>
struct { 
    char c; 
    int i; 
} foo;
</pre>
<p>might be four bytes long rather than three, since the <code>int</code>
field would be aligned to a two-byte boundary. However, this sort of
feature tends to be a configurable option in the C compiler, either using
command-line options or <code>#pragma</code> lines, so you have to find out
how your own compiler does it.</p>
<h4 id="section-9.4.5">9.4.5 <code>c16.mac</code>: Helper Macros for the 16-bit C Interface</h4>
<p>Included in the NASM archives, in the <code>misc</code> directory, is a
file <code>c16.mac</code> of macros. It defines three macros:
<code>proc</code>, <code>arg</code> and <code>endproc</code>. These are
intended to be used for C-style procedure definitions, and they automate a
lot of the work involved in keeping track of the calling convention.</p>
<p>(An alternative, TASM compatible form of <code>arg</code> is also now
built into NASM's preprocessor. See
<a href="nasmdoc4.html#section-4.10">section 4.10</a> for details.)</p>
<p>An example of an assembly function using the macro set is given here:</p>
<pre>
proc    _nearproc 

%$i     arg 
%$j     arg 
        mov     ax,[bp + %$i] 
        mov     bx,[bp + %$j] 
        add     ax,[bx] 

endproc
</pre>
<p>This defines <code>_nearproc</code> to be a procedure taking two
arguments, the first (<code>i</code>) an integer and the second
(<code>j</code>) a pointer to an integer. It returns <code>i + *j</code>.</p>
<p>Note that the <code>arg</code> macro has an <code>EQU</code> as the
first line of its expansion, and since the label before the macro call gets
prepended to the first line of the expanded macro, the <code>EQU</code>
works, defining <code>%$i</code> to be an offset from <code>BP</code>. A
context-local variable is used, local to the context pushed by the
<code>proc</code> macro and popped by the <code>endproc</code> macro, so
that the same argument name can be used in later procedures. Of course, you
don't <em>have</em> to do that.</p>
<p>The macro set produces code for near functions (tiny, small and
compact-model code) by default. You can have it generate far functions
(medium, large and huge-model code) by means of coding
<code>%define FARCODE</code>. This changes the kind of return instruction
generated by <code>endproc</code>, and also changes the starting point for
the argument offsets. The macro set contains no intrinsic dependency on
whether data pointers are far or not.</p>
<p><code>arg</code> can take an optional parameter, giving the size of the
argument. If no size is given, 2 is assumed, since it is likely that many
function parameters will be of type <code>int</code>.</p>
<p>The large-model equivalent of the above function would look like this:</p>
<pre>
%define FARCODE 

proc    _farproc 

%$i     arg 
%$j     arg     4 
        mov     ax,[bp + %$i] 
        mov     bx,[bp + %$j] 
        mov     es,[bp + %$j + 2] 
        add     ax,[bx] 

endproc
</pre>
<p>This makes use of the argument to the <code>arg</code> macro to define a
parameter of size 4, because <code>j</code> is now a far pointer. When we
load from <code>j</code>, we must load a segment and an offset.</p>
<h3 id="section-9.5">9.5 Interfacing to Borland Pascal Programs</h3>
<p>Interfacing to Borland Pascal programs is similar in concept to
interfacing to 16-bit C programs. The differences are:</p>
<ul>
<li>
<p>The leading underscore required for interfacing to C programs is not
required for Pascal.</p>
</li>
<li>
<p>The memory model is always large: functions are far, data pointers are
far, and no data item can be more than 64K long. (Actually, some functions
are near, but only those functions that are local to a Pascal unit and
never called from outside it. All assembly functions that Pascal calls, and
all Pascal functions that assembly routines are able to call, are far.)
However, all static data declared in a Pascal program goes into the default
data segment, which is the one whose segment address will be in
<code>DS</code> when control is passed to your assembly code. The only
things that do not live in the default data segment are local variables
(they live in the stack segment) and dynamically allocated variables. All
data <em>pointers</em>, however, are far.</p>
</li>
<li>
<p>The function calling convention is different &ndash; described below.</p>
</li>
<li>
<p>Some data types, such as strings, are stored differently.</p>
</li>
<li>
<p>There are restrictions on the segment names you are allowed to use
&ndash; Borland Pascal will ignore code or data declared in a segment it
doesn't like the name of. The restrictions are described below.</p>
</li>
</ul>
<h4 id="section-9.5.1">9.5.1 The Pascal Calling Convention</h4>
<p>The 16-bit Pascal calling convention is as follows. In the following
description, the words <em>caller</em> and <em>callee</em> are used to
denote the function doing the calling and the function which gets called.</p>
<ul>
<li>
<p>The caller pushes the function's parameters on the stack, one after
another, in normal order (left to right, so that the first argument
specified to the function is pushed first).</p>
</li>
<li>
<p>The caller then executes a far <code>CALL</code> instruction to pass
control to the callee.</p>
</li>
<li>
<p>The callee receives control, and typically (although this is not
actually necessary, in functions which do not need to access their
parameters) starts by saving the value of <code>SP</code> in
<code>BP</code> so as to be able to use <code>BP</code> as a base pointer
to find its parameters on the stack. However, the caller was probably doing
this too, so part of the calling convention states that <code>BP</code>
must be preserved by any function. Hence the callee, if it is going to set
up <code>BP</code> as a frame pointer, must push the previous value first.</p>
</li>
<li>
<p>The callee may then access its parameters relative to <code>BP</code>.
The word at <code>[BP]</code> holds the previous value of <code>BP</code>
as it was pushed. The next word, at <code>[BP+2]</code>, holds the offset
part of the return address, and the next one at <code>[BP+4]</code> the
segment part. The parameters begin at <code>[BP+6]</code>. The rightmost
parameter of the function, since it was pushed last, is accessible at this
offset from <code>BP</code>; the others follow, at successively greater
offsets.</p>
</li>
<li>
<p>The callee may also wish to decrease <code>SP</code> further, so as to
allocate space on the stack for local variables, which will then be
accessible at negative offsets from <code>BP</code>.</p>
</li>
<li>
<p>The callee, if it wishes to return a value to the caller, should leave
the value in <code>AL</code>, <code>AX</code> or <code>DX:AX</code>
depending on the size of the value. Floating-point results are returned in
<code>ST0</code>. Results of type <code>Real</code> (Borland's own custom
floating-point data type, not handled directly by the FPU) are returned in
<code>DX:BX:AX</code>. To return a result of type <code>String</code>, the
caller pushes a pointer to a temporary string before pushing the
parameters, and the callee places the returned string value at that
location. The pointer is not a parameter, and should not be removed from
the stack by the <code>RETF</code> instruction.</p>
</li>
<li>
<p>Once the callee has finished processing, it restores <code>SP</code>
from <code>BP</code> if it had allocated local stack space, then pops the
previous value of <code>BP</code>, and returns via <code>RETF</code>. It
uses the form of <code>RETF</code> with an immediate parameter, giving the
number of bytes taken up by the parameters on the stack. This causes the
parameters to be removed from the stack as a side effect of the return
instruction.</p>
</li>
<li>
<p>When the caller regains control from the callee, the function parameters
have already been removed from the stack, so it needs to do nothing
further.</p>
</li>
</ul>
<p>Thus, you would define a function in Pascal style, taking two
<code>Integer</code>&ndash;type parameters, in the following way:</p>
<pre>
global  myfunc 

myfunc: push    bp 
        mov     bp,sp 
        sub     sp,0x40         ; 64 bytes of local stack space 
        mov     bx,[bp+8]       ; first parameter to function 
        mov     bx,[bp+6]       ; second parameter to function 

        ; some more code 

        mov     sp,bp           ; undo "sub sp,0x40" above 
        pop     bp 
        retf    4               ; total size of params is 4
</pre>
<p>At the other end of the process, to call a Pascal function from your
assembly code, you would do something like this:</p>
<pre>
extern  SomeFunc 

       ; and then, further down... 

       push   word seg mystring   ; Now push the segment, and... 
       push   word mystring       ; ... offset of "mystring" 
       push   word [myint]        ; one of my variables 
       call   far SomeFunc
</pre>
<p>This is equivalent to the Pascal code</p>
<pre>
procedure SomeFunc(String: PChar; Int: Integer); 
    SomeFunc(@mystring, myint);
</pre>
<h4 id="section-9.5.2">9.5.2 Borland Pascal Segment Name Restrictions</h4>
<p>Since Borland Pascal's internal unit file format is completely different
from <code>OBJ</code>, it only makes a very sketchy job of actually reading
and understanding the various information contained in a real
<code>OBJ</code> file when it links that in. Therefore an object file
intended to be linked to a Pascal program must obey a number of
restrictions:</p>
<ul>
<li>
<p>Procedures and functions must be in a segment whose name is either
<code>CODE</code>, <code>CSEG</code>, or something ending in
<code>_TEXT</code>.</p>
</li>
<li>
<p>initialized data must be in a segment whose name is either
<code>CONST</code> or something ending in <code>_DATA</code>.</p>
</li>
<li>
<p>Uninitialized data must be in a segment whose name is either
<code>DATA</code>, <code>DSEG</code>, or something ending in
<code>_BSS</code>.</p>
</li>
<li>
<p>Any other segments in the object file are completely ignored.
<code>GROUP</code> directives and segment attributes are also ignored.</p>
</li>
</ul>
<h4 id="section-9.5.3">9.5.3 Using <code>c16.mac</code> With Pascal Programs</h4>
<p>The <code>c16.mac</code> macro package, described in
<a href="#section-9.4.5">section 9.4.5</a>, can also be used to simplify
writing functions to be called from Pascal programs, if you code
<code>%define PASCAL</code>. This definition ensures that functions are far
(it implies <code>FARCODE</code>), and also causes procedure return
instructions to be generated with an operand.</p>
<p>Defining <code>PASCAL</code> does not change the code which calculates
the argument offsets; you must declare your function's arguments in reverse
order. For example:</p>
<pre>
%define PASCAL 

proc    _pascalproc 

%$j     arg 4 
%$i     arg 
        mov     ax,[bp + %$i] 
        mov     bx,[bp + %$j] 
        mov     es,[bp + %$j + 2] 
        add     ax,[bx] 

endproc
</pre>
<p>This defines the same routine, conceptually, as the example in
<a href="#section-9.4.5">section 9.4.5</a>: it defines a function taking
two arguments, an integer and a pointer to an integer, which returns the
sum of the integer and the contents of the pointer. The only difference
between this code and the large-model C version is that <code>PASCAL</code>
is defined instead of <code>FARCODE</code>, and that the arguments are
declared in reverse order.</p>
</div>
</body>
</html>
